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Abstract. This paper presents a theory of skiplists of arbitrary height, 
and shows decidability of the satisfiability problem for quantifier-free 
formulas. 

A skiplist is an imperative software data structure that implements sets 
by maintaining several levels of ordered singly-linked lists in memory, 
where each level is a sublist of its lower levels. Skiplists are widely used 
in practice because they offer a performance comparable to balanced 
binary trees, and can be implemented more efficiently. To achieve this 
performance, most implementations dynamically increment the height 
(the number of levels). Skiplists are difficult to reason about because of 
the dynamic size (number of nodes) and the sharing between the different 
layers. Furthermore, reasoning about dynamic height adds the challenge 
of dealing with arbitrary many levels. 

The first contribution of this paper is the theory TSL that allows to 
express the heap memory layout of a skiplist of arbitrary height. The 
second contribution is a decision procedure for the satisfiability prob- 
lem of quantifier-free TSL formulas. The last contribution is to illustrate 
the formal verification of a practical skiplist implementation using this 
decision procedure. 



1 Introduction 

A skiplist [8] is a data structure that implements sets, maintaining several sorted 
singly-linked lists in memory. Skiplists are structured in levels, where each level 
consists of a singly-linked list. Each node in a skiplist stores a value and at least 
the pointer corresponding to the list at the lowest level. Some nodes also contain 
pointers at higher levels, pointing to the next node present at that level. The 
"skiplist property" establishes that the lowest level (backbone) list is ordered, 
and that list at level i + 1 is a sublist of the list at level i. Search in skiplists is 
(probabilistically) logarithmic. The advantage of skiplists compared to balanced 
search trees is that skiplists are simpler and more efficient to implement. 

Consider the skiplist layout in Fig.[T] Higher-level pointers allow to skip many 
elements of the backbone list during the search. A search is performed from left 
to right in a top down fashion, progressing as much as possible in a level before 
descending. Fig. [T] shows in red the nodes traversed when looking value 88. The 
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Fig. 1. A skiplist with 4 levels, and the traversal searching 88 (in red). 



search starts at level 3 of node head, that points to node tail, which stores value 
+oo, greater than 88. Consequently, the search continues at head by moving 
down one level to level 2. The successor of head at level 2 stores value 22, which 
is smaller than 88. Hence, the search continues at level 2 from the node storing 
22 until a value greater than 88 is found. The expected logarithmic search of 
skiplists follows from the probability of a node being present at a certain level 
decreasing by 1/2 as the level increases (see |8J for an analysis of the running 
time of skiplists). 

In practice, implementations of skiplists vary the height dynamically main- 
taining a variable that stores the current highest level of any node in the skiplist. 
The theory TSL presented in this paper allows to automatically proof verification 
conditions of skiplists with height unbounded (as indicated by a this variable). 

We are interested in the formal verification of implementations of skiplists, 
which requires to reason about unbounded mutable data stored in the heap. One 



popular approach to the verification of heap programs is Separation Logic 10 
Skiplists, however, are problematic for separation-like approaches due to the 
aliasing and memory sharing between nodes at different levels. Most of the work 
in formal verification of pointer programs follows program logics in the Hoare 
tradition, either using separation logic or with specialized program logics to deal 
with the heap and pointer structures T]|4f5|[l3 . Our approach is complementary, 
consisting on the design of specialized decision procedures for memory layouts 
which can be incorporated into a reasoning system for proving temporal proper- 
ties, in the style of Manna-Pnueli [6] . In particular for proving liveness properties 
we advocate the use of general verification diagrams [2] ,which allow a clean sep- 
aration between the temporal reasoning with the reasoning about the data being 
manipulated. Proofs (of both safety and liveness properties) are ultimately de- 
composed into verification conditions (VCs) in the underlying theory of state 
assertions. This paper studies the automatic verification of VCs involving the 
manipulation of skiplist memory layouts. For illustration purposes we restrict 
the presentation in this paper to safety properties. 

Logics like [T[|4 13 are very powerful to describe pointer structures, but they 
require the use of quantifiers to reach their expressive power. Hence, these logics 
preclude their combination with methods like Nelson-Oppen [7] or BAPA [3; with 
other aspects of the program state. Instead, our solution use specific theories 
of memory layouts [9 11 12 that allow to express powerful properties in the 



quantifier-free fragment using built-in predicates. 
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For example, in 12 we presented TSLk, a family of theories of skiplists of 



fixed height, which are unrolled into the theory of ordered singly-linked lists [IT] . 
Limiting the height of the skiplist (for example to a maximum of 32 levels) would 
enable to use of TSLk for verification of such implementations but unfortunately, 
the model search involved in the automatic proofs of TSLk VCs is only practical 
for much lower heights. Handling dynamic height was still an open problem that 
precluded the verification of practical skiplist implementations. We solve this 
open problem here with TSL. The theory TSL we present in this paper allows us 
to reduce the verification of a skiplist of arbitrary height to verification conditions 
of TSLk, where the value of K is small and independent of the skiplist height in 
any state of any implementation. 

The rest of the paper is structured as follows. Section [2] presents a running 
example of a program that manipulates skiplists. Section [3] introduces TSL: the 
theory of skiplists of arbitrary height. Section [4] includes the decidability proof. 
Section[5]provides some examples of the use of TSL in the verification of skiplists. 
Finally, Section [6] concludes the paper. Some proofs are missing due to space 
limitation and are included in the appendix. 



2 A Skiplist Implementation 

Fig. [2] shows the pseudo-code of a sequential implementation of a skiplist, whose 
basic classes are Node and SkipList. Each node stores a key (in the field key) for 
keeping the list ordered, a field val containing the actual value stored, and a field 
next: an array of arbitrary length containing the addresses of the following nodes 
at each level. An entry in next at index i points to the successor node at level 
i. Given an object si of class SkipList, we use si. head, si. tail and sl.maxLevel 
for the data members storing the head node, the tail node and the maximum 
level in use (resp.) When the SkipList object si is clear from the context, we use 
head, tail and maxLevel instead of si. head, si. tail and sl.maxLevel. The program 
in Fig[2]allows executions in which the height of a skiplist, as stored in maxLevel, 
can grow beyond any bound. Finally, nodes contain a ghost field level storing 
the highest level of next. We use the @ symbol to denote a ghost field and boxes 
to describe ghost code. This extra "ghost" code is only added for verification 
purposes and does not influence the execution of the existing program (it does not 
affect the control flow or non-ghost data) , and it is removed during compilation. 
Objects of SkipList maintain one ghost field reg to represent the region of the 
heap (set of addresses) managed by the skiplist. In this implementation, head 
and tail are sentinel nodes for the first and last nodes of the list, initialized 
with key — — oo and key = +oo (resp.) These nodes are not removed during 
the execution and their key field remains unchanged. The amount of ghost code 
introduced for verification is very small, containing only the book-keeping of the 
region reg. 

Fig. [2] shows the algorithms for insertion (Insert), search (Search) and re- 
moval (Remove). Fig. [2] also shows the most general client MGC, a program 
that non-deterministically performs calls to skiplist operations. In this imple- 
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1: procedure MGC(SkipList si) 
2: while true do 
3: v := NondetPickValue 

4: nondet 

call Insert(s/, v) 
or 

call Search(sZ,u) 
or 

call Remove(.s2,u) 
end while 
7: end procedure 



procedure lNSERT(SkipList si, Value v) 
Array (Node*) upd 
Int Ivl := randomLevel 
Bool valueWasIn := false 
if Ivl > sl.m,axLevel then 

for i := (sl.maxLevel + 1) to Ivl do 
sl.head.next[i] := si. tail 
si .tail .next[i] := null 
end for 

sl.maxLevel := Ivl 
end if 

Node* pred := si. head 

Node* curr := pred .next[sl .maxLevel] 

Int i := sl.maxLevel 

while < i A ^valueWasIn do 

curr := pred.next[i] 

while curr.val < v do 
pred := curr 
curr := pred.next[i] 

end while 

upd[i] := pred 

i := i — 1 

valueWasIn := (curr.val = v) 
end while 

if -^valueWasIn then 

x := CreateNode(lvl, v) 
for i := to to/ do 

x.next[i] := jipd[j].nezt[i] 
[il.nexf[i] := a; 



it ' i = then 

sZ.rep := sl.reg U {s} 
end for 
end if 

return -^valueWasIn 
end procedure 



procedure Se ARCn(SkipList si, Value v) 
Int i := sl.maxLevel 
Node* pred := si. head 
Node* curr := pred.next[i] 
while < i A curr.val j= v do 
curr := pred.next[i] 
while curr.val < v do 
pred := curr 
curr := pred. ne;rt[i] 
end while 
i := i — 1 
end while 
return curr.val = u 
end procedure 



procedure R,EM0VE(5A;«pLisi sZ, VaZue u) 
Array (Node*)[sl. maxLevel + 1] upd 
/ret removeFrom := sl.maxLevel 
Node* pred := si. head 
Node* curr := pred. next[sl. maxLevel] 
for i := sl.maxLevel downto do 
curr := pred.next[i] 
while curr.val < v do 
pred := curr 
curr := pred.next[i] 
end while 

if curr.val ^ u then 

removeFrom := j — 1 
end if 

«pd[i] := pred 
end for 

Boo/ valueWasIn := (curr.val = u) 
if valueWasIn then 

for i := removeFrom downto do 
upd [i] . next [i] := curr.next[i] 
if i = then 

sl.reg := sl.reg \ {curr} 

end for 



free (curr) 
end if 

return valueWasIn 
end procedure 



class Node { Value val; Key key; Array(Node*) next; Int ©level; } 
class SkipList { Node* head; Node* tail; Int ©maxLevel; Set(Addr) @reg; } 



Fig. 2. Most general client, Insert, Search and Remove algorithms for skiplists, and 
the classes Node and SkipList. 
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mentation, we assume that the initial program execution begins with an empty 
skiplist containing only head and tail nodes at level has already been created. 
New nodes are then added using the Insert operation. Since MGC can execute 
all possible sequence of calls, it can be used to verify properties like method 
termination or skiplist-shape preservation. The program updates the ghost field 
reg to represent the set of nodes that forms the skiplist at every state. That is: 
(a) a new node becomes part of the skiplist as soon as it is connected at level 
in Insert (line 36); and (b) a node that is being removed stops being part of the 
skiplist when it is disconnected at level in Remove (line 74). For simplicity, 
we assume in this paper that the fields vol and key within an object of type 
Node contain the same object. A crucial property that we wish to prove of this 
implementation is that the memory layout maintained by the algorithm is that 
of a "skiplist": the lower level is an ordered acyclic single linked list, all levels 
are subset of lower levels, and all the elements stored are precisely those stored 
in addresses contained in region reg. 

3 The Theory of Skiplists of Arbitrary Height: TSL 

We present in this section TSL: a theory to reason about skiplists of arbitrary 
height. Formally, TSL is a combination of different theories. 

We begin with a brief overview of notation and concepts. A signature E is 
a triple (S, F, P) where S is a set of sorts, F a set of functions and P a set 
of predicates. If E x = [S u F u Pi) and E 2 = (S 2 , F 2 , P 2 ), we define E x U E 2 = 
(Si U S 2 , Fi U F 2 , Pi U P 2 ). Similarly we say that Si C E 2 when Si C S 2 , ^ C 
F 2 and Pi C P 2 . If t(ip) is a term (resp. formula), then we denote with V a (t) 
(resp. V a (ip)) the set of variables of sort a occurring in t (resp. ip). Similarly, 
we denote with C a (t) (resp. C a (<p)) the set of constants of sort a occurring in t 
(resp. ip). 

A ^-interpretation is a map from symbols in E to values. A Z'-structure is 
a ^'-interpretation over an empty set of variables. A Z'-formula over a set X 
of variables is satisfiable whenever it is true in some ^-interpretation over X . 
Let 12 be a signature, A an /^-interpretation over a set T^of variables, E C [2 
and U C V. A E ' U denotes the interpretation obtained from A restricting it to 
interpret only the symbols in E and the variables in U. We use A s to denote 
A S '®. A U-theory is a pair (E, A) where S is a signature and A is a class of E- 
structures. Given a theory T — (E, A), a T-intcrprctation is a ^-interpretation 
A such that A s € A. Given a Z'-theory T, a ^-formula <p over a set of variables 
X is T-satisfiable whenever it is true on a T- interpretation over X . 

Formally, the theory of skiplists of arbitrary height is defined as TSL = 
(Zjsl, TSL), where Ej$\_ is the union of the following signatures, shown in 
Fig. H 

Ejsl = E\ eve \ U ^ord U E anay U E ce \\ U E mem U breach U 'S'set U ^bridge 

and TSL is the class of .EVsL-structures satisfying the conditions listed in Fig.|4j 
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Informally, sort addr represents addresses; elem the universe of elements that 
can be stored in the skiplist; level the levels of a skiplist; ord the ordered keys used 
to preserve a strict order in the skiplist; array corresponds to arrays of addresses, 
indexed by levels; cell models cells representing objects of class Node; mem mod- 
els the heap, a map from addresses to cells; path describes finite sequences of 
non-repeating addresses to model non-cyclic list paths, while set models sets of 
addresses — also known as regions. 

The symbols in S set are interpreted according to their standard interpreta- 
tions over set of addresses, l^ievei contains symbols and s to build the natural 
numbers with the usual order. S or d models the order between elements, and con- 
tains two special elements — oo and +00 for the lowest and highest values in the 



Signt 

-^level 
^ord 



Sort 
level 
ord 



Functions 

: level 

s : level — > level 

—00, +00 : ord 



Predicates 
<: level x level 
-< : ord x ord 



array 
level 
addr 



_[_] : array x level — > addr 

_} : array x level x addr — > array 



cell 
elem 

ord 
array 
addr 
level 
mem 
addr 

cell 



error 


cell 


mkcell 


elem x ord x array x level — > cell 


_. data 


cell — > elem 


-key 


cell — > ord 


_.arr 


cell — > array 


_.max 


cell — > level 



null : addr 

rd : mem x addr — > cell 
upd : mem x addr x cell 



mem 
addr 
path 



path 

addr — > path 



append : path x path x path 
reach : mem x addr x addr 
x level x path 



addr 
set 



V) : set 

{_} : addr — ¥ set 

U, n, \ : set x set — > set 



e : addr x set 
C : set x set 



£1 



bridge 



mem 
addr 
set 
path 
level 



path2set : path — > set 
addr2set : mem x addr x level — > set 
getp : mem x addr x addr x level — > path 



ordList : mem x path 
skiplist : mem x set x level 
x addr x addr 



Fig. 3. The signature of the TSL theory 
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Each sort a in Xtsi_ is mapped to a non-empty set A a such that: 
(a) ^laddr and A e iem are discrete sets (b) .4ievei is the naturals with order 
(c) A or d is a total ordered set (d) Array = A^j^."' 

(e) Aeii = Aiem x Aord x Array x Avei (f) A P ath is the set of all finite sequences of 

(g) Aem = ^4.^f > , ddr (pairwise) distinct elements of Addr 

(h) A&t is the power-set of Addr 



Signature 



Interpretation 



o A = o 



s (I) = s(l), for each I G Aevei 



• x< A y A y< A x — > x = y 

• x< A y A y< A z -> x< A z 
for any x,y,z G Aord 



• x< A y\/ y< A x 

• -oo A < A x f\x< A +oo A 



• A^]- 4 = Ajf) 

• A{1 «- a} A = A<™, where Aew(0 = a and Aew(i) = A(i) for i ^ I 

for each j4, A new G Array, i € Avel and a G Addr 

• mkcell A (e, k, d, I) = (e, k, A, I) • error A .arr A (l) = null A 

• {e,k,A,l).data A =e • (e, k, A, I) .key A =k 

• (e, k, A, l).arr A =A • (e, k, A, I) .max A — I 

for each e G Alem, k G Aord, A G Array, and / G Avel 



• rd(m, a) = m(a) 



(m, a, c) = m H 



m (null A ) = error'' 



for each m G Aem, a G Addr and c G Aeii 
• e A is the empty sequence 



[a] A is the sequence containing a G Addr as the only element 
([01 ..a n ] , [61 .. ]) G append A iff a*, ^6;. 

(m,a in it,a en d,l,p) G reach A iff Oi„« = a cn d and p = e, or there exist 
addresses 01, . . . , o„ G Addr such that: 

(a) p = [<zi .. o n ] (c) m(a r ).arr A (l) — a r +i, for r<n 

(b) ai = Oi„rt (d) m(a n ).arr A (l) = a end 



- 1 bridge 



for each m G A™, p G ^4 pa th , I € Aevei, a;, a e G Addr, r G Aet 

• path2set A {p) — {ai, . . . , a n } for p = [01, . . . , o„] G ^4 pat h 

• addr2set A (rn,a,l) = {a' j 3p G ^4 pat h • {m,a,a' ,l,p) G reach A } 

• getp A (m,ai,a e ,l) = p if (m,a,i,a e ,l,p) G reach A , and e otherwise 



ordList (m, p) iff p = e or p = [a] , or p = [01 , 



,] with n > 2 and 



m(a,j) .key ■< m(aj+i).key for all 1 < j < n, for any m G Aem 

ordList A (m, getp A (m, ai, a e , 0)) 

r = addr2set A (m, a;, 0) 

< I A Vo 6 r . m(a).max A < I 

m(a e ).arr A (l) — null A 

(0 = Uox) V 

(3l p . s A (l p ) = lA V i G 0, . . . ,l p . 

m(a e ).arr A (i) = null A A 

path2set A (getp A (m, <n, a e , s A (i))) C 

path2set A (getp A (m, ai,a e , i))) 



skiplist (rn,r,l,ai,a e ) iff 



Fig. 4. Characterization of a TSL- interpretation .4 



To decide whether ip- lrL : TSL is SAT: 


STEP 


1. 


Sanitize: 






V? := (fin A A ('new = 1 + 1) 






S=A{;<— a}G¥>i„ 


STEP 


2. 


Guess arrangement a of VJ eV ei (</?)• 


STEP 


3. 


Split v? into (p PA A a) and (^ NC A a). 


STEP 


4. 


Check SAT of (</ A A a). 






If UNSAT -> return UNSAT 


STEP 


5. 


Check SAT of (^ NC A a) as follows: 


4 


1 Let k= jVIevei(^ NC Aa)|. 


4 


2 Check r <^ NC A a" 1 : TSL K (fc): 






If SAT -> return SAT 






else return UNSAT. 




Fig. 5. A decision procedure for the satuasfibility of TSL formulas (left). A split of ip 
obtained after STEP 1 into ip FA and ip NC (right). 



order <. S array is the theory of arrays defining two operations: A[i] to capture 
the element of sort addr stored in array A at position given by i of sort level, 
and A{i a} for an array write, which defines the array that results from A by 
replacing the element at position i with a. 27 ce n contains the constructors and 
selectors for building and inspecting cells, including error for incorrect derefer- 
ences. £ mern is the signature for heaps, with the usual memory access and single 
memory mutation functions. S set is the theory of finite sets of addresses. The sig- 
nature I^each contains predicates to check reachability of addresses using paths 
at different levels. Finally, ZWidge contains auxiliary functions and predicates 
to manipulate and inspect paths as well as a native predicate for the skiplist 
memory shape. 

4 Decidability of TSL 

Fig. [5] shows a decision procedure for the satisfiability problem of TSL formulas, 
by a reduction to satisfiability of quantifier-free TSL« formulas and quantifier- 
free Presburger arithmetic formulas. We start from a TSL formula ip in dis- 
junctive normal form: fx V • • • V ip n so the procedure only needs to check the 
satisfiability of a conjunction of TSL literals ipi. The rest of this section describes 
the decision procedure and proves its correctness. 

A fiat literal is of the form x = y, x ^ y, x = f(y 1} y n ), p{y x , ■ . ■ , y„) or 
-^(yi, . . . , y n ), where x, y, y\, . . . , y n are variables, / is a function symbol and p 
is a predicate symbol defined in the signature of TSL. We first identify a set of 
normalized literals. All other literals can be converted into normalized literals. 

Definition 1. A normalized TSL-literal is a flat literal of the form: 
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ei ^ e 2 
a = null 
ki ^ k 2 

c = mkcell(e, k, A, I) 
s = {a} 
a = A[l] 

Pi ^P2 

s = path2set{p) 

s = addr2set(m, a, I) 

ordList(m,p) 



ai ^ a 2 
c = error 
k\ ■< k- 2 

h < h 

si = s 2 Us 3 

B = A{1 <- a} 

P = [«] 

append(p!,p 2 ,p 3 ) 
p = getp(m,ai,a,2,l) 



c = rd(m, a) 

m 2 = upd(m\, a, c) 

l = q 

Si = S 2 \ S 3 

px = rev(p 2 ) 
->append(p 1 ,p2,P3) 

skiplist(m, s, 01, a 2 ) 



where e, e% and e 2 are e\em-variables; a, a\ and a 2 are addr -variables; c is a eel I - 
variable; m, mi and m 2 are mem-variables; p, p±, p 2 and p 3 are path-variables; 
s 7 Si, s 2 and S3 are set-variables; A and B array -variables; k, k\ and k 2 are 
ord-variables and I, l\ and l 2 are \eve\-variables, and q is an level constant. 

The set of non-normalized literals consists on all flat literals not given in 
Definition[Tj For instance, e = c.data can be rewritten as 3 ort jfc 3 array ^4 3| eve |Z | c = 
mkcell(e,k, A,l) and reach(m,ai,a 2 ,l,p) can be translated into the equivalent 
formula a 2 G addr2set{m,ai,l) hp — getp(m, a\, a 2 ,l). 

Lemma 1. Every TSL-formula is equivalent to a collection of conjunctions of 
normalized TSL-literals. 

For example, consider the skiplist presented in Fig. [T] and the following for- 
mula ip that we will use as a running example: 



i = A A = rd(heap, head).arr A B — A{i 4— tail}. 



This formula establishes that B is an array that is equal to the next pointers of 
node head, except for the lower level that now contains the address of tail. To 
check the satisfiability of this formula we first normalize it, obtaining '(/'norm 1 



i = A 



( c = rd(heap, head) A 
c = mkcell(e, k, A, I) A ] A B = A{i 
y 1 = 3 



tail}. 



4.1 STEP 1: Sanitation The decision procedure begins with STEP 1 by sani- 
tizing the normalized collection of literals received as input. 

Definition 2 (Sanitized). A conjunction of normalized literals is sanitized if 
for every literal B = A{1 4— a} there is a literal of the form l new = I + 1, where 
l-new is a newly introduced variable if necessary. 

The fresh level variables in sanitized formulas will be later used in the proof of 
Theorem[T]below to construct a proper model by replicating level l new instead of 
level I. In turn, sanitation allows to show the existence of models with constants 
from models of sub-formulas without constants. Sanitizing a formula does not 
affect its satisfiability because it only adds an arithmetic constraint (l new = 1 + 1) 
for a fresh new variable l„ e w Hence, a model of ip (the sanitized formula) is a 
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model for <p- m (the input formula) , and from a model of <pi n one can immediately 
build a model of tp by computing the values of the variables l new . Considering 
again our example, after sanitizing V'norm we obtain V'sanit: 

V^sanit • V'norm A ^new i ~l~ 1. 

4.2 STEP 2: Order arrangements, and STEP 3: Split In a given model of 
a formula, every level variable is assigned a natural number. Hence, every two 
variables arc cither assigned the same value or their values are ordered. We call 
these order predicates an order arrangement. Since there is a finite number of 
level variables, there is a finite number of possible order arrangements. STEP 2 
consists of guessing one order arrangement. 

STEP 3 uses the order arrangement to reduce the satisfiability of a sanitized 
formula that follows an order arrangement into the satisfiability of a Presburgcr 
Arithmetic formula (checked in STEP 4), and the satisfiability of a sanitized 
formula without constants (checked in STEP 5). An essential element in the 
construction is the notion of gaps. The ability to introduce gaps in models allows 
to show that if a model for the formula without constants exists, then a model 
for the formula with constants also exists (provided the Presburgcr constraints 
are also met). 

Definition 3 (Gap). Let A be a model of (p. We say that n E N is a gap in A 

if there are variables li,l 2 in Mevei(^) such that if < n < if, but there is no I in 
Mevei^) with l A — n. 

Consider V'sanit for which Mevei(V'sanit) = {i, lnew, 1} ■ A model Atf, that interprets 
variables i, l new and I as 0, 1 and 3 respectively has a gap at 2. A gap-less model 
is a model without gaps, either between two level variables or above any level 
variable. 

Definition 4 (Gap-less model). A model A of ip is a gap-less model whenever 
it has no gaps, and for every array C in array- 4 and level n > l A for all I e 
Mevel(^); C(n) = null. 

The following intermediate definition and lemma greatly simplify subsequent 
constructions by relating the satisfaction of literals between two models that 
agree on most sorts and the connectivity of relevant levels. 

Definition 5. Two interpretations A and B of a formula ip agree on sorts a 
whenever A a = B a and 
(i) for every v G V a ((p), v A = v B , 

(ii) for every function symbol f with domain and codomain from sorts in a, 
f A = f B and for every predicate symbol with domain in a, P A iff P B . 

Lemma 2. Let A and B be two interpretations of a sanitized formula ip that 
agree on a : {addr, elem, ord, path, set}, and such that for every I <E V\ eye \((p), 
m G V mem ((p), and a G addr" 4 : m A (a).arr A (l A ) — mP (a).arr B (l B ). It follows that 
reach A (m A , af nit , af ndl l A ,p A ) if and only if reach 6 (m B , af nit , a B nd , l B ,p B ). 
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We show now that if a sanitized formula without constants, as the one ob- 
tained after the split in STEP 3, has a model then it has a model without gaps. 

Lemma 3 (Gap-reduction). Let A be a model of a sanitized formula tp with- 
out constants, and let A have a gap at n. Then, there is a model B of p such 
that, for every I £ V\ eve \(cp): l B = l A - 1 if l A > n, and l B = l A if l A < n. The 
number of gaps in B is one less than in A. 

Proof. (Sketch) We show here the construction of the model and leave the ex- 
haustive case analysis of each literal for the appendix. Let A be a model of ip 
with a gap at n. We build a model B with the condition in the lemma as follows. 
B agrees with A on addr, elem, ord, path, set. In particular, v B = v A for variables 
of these sorts. For the other sorts we let B a — A a for a — level, array, cell, mem. 
We define the following transformation maps: 

Aevel(j) 

&eii((e,fc,A,0) 

Now we are ready to define the valuations of variables I : level, A : array, 
c : cell and m : mem: 

l B = AeveiG" 4 ) A B = (3 3rr3y (A A ) c B =(3 ceU (c A ) m B =/3 mem (m- 4 ) 

The interpretation of all functions and predicates is preserved from A. An ex- 
haustive case analysis on the normalized literals allows to show that B is indeed 
a model of ip. □ 

For instance, considering formula V'sanit and model A^p, we can construct 
model B,p reducing one gap from A^ by stating that i B ^ — i A * , l new * = Inew 
and l B ^ — 2, and completely ignoring arrays in model A^, at level 2. 

Lemma 4 (Top- reduction). Let A be a model of p, and n a level such that 
n > l A for all I £ Vfevei(v') an d A £ array^ 1 be such that A(n) ^ null. Then the 
interpretation B obtained by replacing A{n) = null is also a model of tp. 

Proof. By a simple case analysis on the literals of p, using Lemma [2] □ 

Corollary 1. Let ip be a sanitized formula without constants. Then, tp has a 
model if and only if tp has a gapless model. 

STEP 2 in the decision procedure guesses an order arrangement of level vari- 
ables from the sanitized formula tp. Informally, an order arrangement is a total 
order between the equivalence classes of level variables. 

Definition 6 (Order Arrangement). Given a sanitized formula p, an order 
arrangement is a collection of literals containing, for every pair of level variables 
h,h € V\ eve \((p), exactly one of: (h = l 2 ), (h < h), or (l 2 < h). 



[ j if j < n \ A(i) if i < n 

[j - 1 otherwise I A(i + 1) if i > n 

= (e, k, /3array(-4), Aevel(O) /? mem (m) (a) = /3 C e\\{m(a)) 
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For instance, an order arrangement of V'sanit is {i < l n ewji < Mnew < I}- As 
depicted in Fig. [5] (right), STEP 3 of the decision procedure splits the sanitized 
formula tp into <py A , which contains precisely all those literals in the theory of 
arithmetic S\ eve \, and <p containing all literals from tp except those involving 
constants (I — q). Clearly, tp is equivalent to p NC A p PA . In our case, V'sanit is 
split into -0 PA and t// ic : 

-0 PA : i = A I = 3 A l new =i + l 

, T „ ( c = rd(heap, head) A \ 

V> NC : „; , ; n A B = A{i <- ioii} A l new = i + 1. 

\c~mkcell(e,k,A,l) J 

For a given formula there is only a finite collection of order arrangements sat- 
isfying <yS PA . We use arr(tp PA ) for the set of order arrangements of variables 
satisfying p PA . A model of p PA is characterized by a map / : Vf e vei(<^) — > N 
assigning a natural number to each level variable. In the case of tp PA , f maps 
i, l new and I to 0, 1 and 3 respectively. Also, for every model / of p PA there 
is a unique order arrangement a € arr(p PA ) for which / t= a. STEP 4 consists 
of checking whether there is a model of p PA that corresponds to a given order 
arrangement a by simply checking the satisfiability of the Presburger arithmetic 
formula (<^ PA A a). 

We are now ready to show that the guess in STEP 2 and the split in STEP 
3 preserve satisfiability. Theorem [l] below allows to reduce the satisfiability of 
tp to the satisfiability of a Presburger Arithmetic formula and the satisfiability 
of a TSL formula without constants. We show in the next section how to decide 
this fragment of TSL. 

Theorem 1. A sanitized TSL formula p is satisfiable if and only if for some 
order arrangement a, both (p PA A a) and (<p NC A a) are satisfiable. 

4.3 STEP 4: Presburger Constraints The formula p PA contains only literals 
of the form li = q, li ^= 1%, li = I2 + 1, and l\ < l 2 for integer variables li and I2 
and integer constant q. The satisfiability of this kind of formulas can be easily 
decided with off-the-shelf SMT solvers. If i^ PA is unsatisfiable then the original 
formula (for the guessed order arrangement) is also unsatisfiable. 

4.1 STEP 5: Deciding Satisfiability of Formulas Without Constants 

We show here the correctness of the reduction of the satisfiability of a sanitized 
formula without constants to the satisfiability of a formula in the decidable 
theory TSLk (STEP 5). That is, we detail how to generate from a sanitized 
formula without constants ip (formula (tp A a) in Fig. [5]) an equisatisfiable 
TSLk formula r ip^ for a finite value K computed from the formula. The bound is 
K = I Vievei(VOI- This bound limits the number of levels required in the reasoning. 
We use [K] as a short for the set ... K — 1. For ^sanit, we have K = 3 and thus 
we construct a formula in TSL3. 
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The translation from ip into r ip^ works as follows. For every variable A of 
sort array appearing in some literal in tp we introduce K fresh new variables 
V AM> • - • > u ^4[K-i] °f sort addr. These variables correspond to the addresses from 
A that the decision procedure for TSI_k needs to reason about. All literals from 
tp are left unchanged in r ip~' except (c = mkcell(e, k, A, I)), (a — A[l]), (B = 
A{1 4— a}), B — A and skiplist(m, s, oi, 0,2) that are changed as follows: 

— c = mkcell(e, k, A, I) is transformed into c = (e, k, Va\o] , • ■ • , Va[K—i])- 

— a = gets translated into: /\ / = i — > a = u^m. 

z=0...K-l 

— B = A{1 <— a} is translated into: 

( A l = i^a = v B [n) A ( /\ j -> « Bbl = u Am ) (1) 

i=0...K-l j=0...K-l 

— skiplist(m, r, aj., CJ2) gets translated into: 

ordList(m, getp(m,ax,a,2,0)) A r = path2set(getp(m,ai,a2,0)) A 

A rd(m, a2)-arr[i] = null A 
ieo...K-i 

A pa,th2set(getp(m,ai,a,2,i + 1)) C path2set(getp(m,ai,a2,i)) 
teo...K-2 

(2) 

Note that the formula r <p n obtained using this translation belongs to the 
theory TSL«. For instance, 



r„/,NCn 



i = — > ia«l = «s[o] A i = 1 — > iai/ = «b[i] A i = 2 — > iai/ = Um2] A 
« 7^ -> « B[0 ] = Ua[o] a « 7^ 1 -> = A i ^ 2 -» u s[2] = v A[2] A 
c = rd (heap, head) A c = mkcell(e, k, vmq\, Uytm, Uxpl) A Z„ eu , = i + 1 



The following lemma establishes the correctness of the translation. 

Lemma 5. Let tp be a sanitized TSL formula with no constants. Then, tp is 
satisfiable if and only if r %p~^ is also satisfiable. 

The main result of this paper is the following decidability theorem, which 
follows immediately from Lemma [5j Theorem [T] and the fact that every formula 
can be normalized and sanitized. 

Theorem 2. The satisfiability problem of (QF) TSL-formulas is decidable. 



5 Example: Skiplist Preservation 

We sketch the proof that the implementation given in Fig. [2] preserves the skiplist 
shape property. This is a safety property, and can be proved using invariance: the 
data structure initially has a skiplist shape and all transitions preserve this shape. 
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This invariance proof is automatically decomposed in the following verification 
conditions: 

(Ini) : — > skiplist (Con) : /\»ei 79 skiplist A Tj — > skiplist' 

where denotes the initial condition and r, is the transition relation Ti(V,V) 
corresponding to program line i, relating variables in the pre-state (V) with vari- 
ables in the post-state (V). Finally, skiplist and skiplist' are short notation for 
skiplist (heap, r, maxLevel, head, tail) and skiplist (heap , r' , maxLevel' , head' , tail ) 
respectively. All VCs discharged are quantifier- free TSL formulas and thus are 
verifiable using our decision procedure. We use a single value to denote the key 
and value of a cell, hence a cell (v, A, I) represents (v, v, A, I) and rd(c) as a short 
for rd(heap,c). Condition (Ini) is easy to verify, from initial condition 0: 



= 



rd(head) = Ch A C/j = (— 00, Ah,0) A j4^[0] = tail A maxLevel = A 
rd(tail) = c t A c t = (+00, A t ,0) A A t [0] = null A r = {head, tail} 



To prove the validity of (Con), we negate it and show that skiplist A Tj A 
^skiplist 1 is unsatisfiable. As shown above, ^skiplist' is normalized into five dis- 
juncts. Two of them are: (NSL1) (^ordList(m, getp(heap, head, tail, 0))); and 
(NSL4) (a G reg A rd (heap, a). level > maxLevel). 

Consider (NSL1). The only offending transition that could satisfy the nega- 
tion of the VC is t%q, which connects a new cell to the skiplist. We can automat- 
ically prove that this transition preserves the skiplist order using the following 
supporting invariants: 

pc = 21 — >• curr = rd(pred) .arr [maxLevel] A \ 
pc = 22..25,27..29,60..63,65..70 -> curr = rd(pred).arr[i] J 

[rd(pred).val < v A \ 
rd(pred).val < rd(tail).val I 
(pc = 22.. 38, 60.. 70, 73.. 75 A i < j < maxLevel) V \ 
(pc = 71. .72 A < j < maxLevel) J 

(rd(upd[j]).val < v A \ 
rd(rd(upd[j]).arr[j]).val > v J 

where pc denotes the program counter. We use (pc = a..b) to denote (pc = 
a V • • • V pc = b). Invariant </? nox t establishes that curr points to the next cell 
pointed by pred at level i. Invariant <p pr cdLoss says that the value pointed by pred 
is always strictly lower than the value we are inserting or removing, and the value 
pointed by tail. Finally, ^ rd(j) establishes that when inside the loops, array upd 
at level j points to the last cell whose value is strictly lower than the value to 
be inserted or removed. This way, when taking T36, the decision procedure can 
show that the order of elements in the list is preserved. 

Checking (NSL4) is even simpler, requiring only the following invariant: 

Abound = (p c — 19. .40 — > Ivl < maxLevel) A (pc = 34. .40 — > rd(x). level = Ivl) 

A similar approach is followed for all other cases of -1 skiplist' . 



Vnext : 

^prodLc SS = pc = 20..40, 59. .79 

Vord(j) = 
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6 Conclusion and Future Work 

In this paper we have presented TSL, a theory of skiplists of arbitrary many lev- 
els, useful for automatically prove the VCs generated during the verification of 
skiplist implementations. TSL is capable of reasoning about memory, cells, point- 
ers, regions and reachability, ordered lists and sublists, allowing the description 
of the skiplist property, and the representation of memory modifications intro- 
duced by the execution of program statements. The main novelty of TSL is that 
it is not limited to skiplists of a limited height. 

Wc showed that TSL is decidable by reducing its satisfiability problem to 
TSLk [12] (a decidable theory capable of reasoning about skiplists of bounded 
levels) and we illustrated such reduction by some examples. Our reduction al- 
lows to restrict the reasoning to only the levels being explicitly accessed in the 
(sanitized) formula. 

Future work also includes the temporal verification of sequential and concur- 
rent skiplists implementations, including industrial implementations like in the 
java. concurrent standard library. We are currently implementing our decision 
procedure on top of off-the-shelf SMT solvers such as Yices and Z3. This imple- 
mentation so far provides a very promising performance for the automation of 
skiplist proofs. However, reports on this empirical evaluation is future work. 
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A Missing Proofs 

Lemma 1. Every TSL-formula is equivalent to a collection of conjunctions of 
normalized TSL-literals. 

Proof. By case analysis on non-normalized literals. For illustration purpose we 
show some interesting cases only. For instance, ^ordList(m,p) is equivalent to: 

(3Zi, I2, zero : level) (3ai,a 2 : addr) (3ci,c 2 : cell) 

(3ei,e 2 : elem) (3ki,k 2 : ord) (3Ai,A 2 : array) 

ai e path2set(p) Aa 2 £ path2set(p) A zero = A (3) 

C\ = rd(m,ai) A c\ = mkcell (ei,ki, A±,li) A (4) 

a 2 = Ai[zero] Ac 2 = ni(m, 02) A c 2 = mkcell(e 2 ,k 2 , A 2 ,l 2 ) A (5) 

fc 2 r< fa A fc 2 ^ fci (6) 

Conjunct (3) establishes that there are two witness addresses 01 and a 2 in path 
p. Literal (4) captures that c\ is the cell at which a\ is mapped in memory m. 
Conjunct (5) captures that c 2 is the cell next to c\ on memory m, following 
pointers at level 0. That is, c 2 immediately follows C\ in heap m. Finally, ([6| 
establishes that the key of c\ is strictly greater that the key of c 2 , violating the 
order of the list. 

As another example, consider literal -iskiplist(m } r, 2, a^, a e ). Based on the 
interpretation given in Fig. |4j this literal is equivalent to the following: 

[ (Bp : path) p = getpim, Oj, a e , 0) A ^ordList(m,p) ]v (NSL1) 

[(3p : path)(3s : set) p = getp(m,a l ,a e ,0)As = path2set(p) Ar ^ s] V (NSL2) 

[l<0 ]v (NSL3) 

(3a : addr)(3e : elem)(3fc : ord)(3A : array)(3f : level)(3c : cell) ^ | 

a£rAc = rd(m, o) A c = mkcell(e, k, A,l) A I < I 



"(3a : addr)(3e : elem)(3fc : ord)(3j4 : array)(3/i, l 2 : level) 
(3c : cell) 

l^QAQ<l 2 Al 2 <hA 

c = rd(m, a e ) Ac— mkcell(e, k, A, l\) A a = A[l 2 ] A a ^= null 
~{3h,l 2 : level) (3p 1) p 2 ■ path)(3si, s 2 : set) 
l^0A0<hAh<lAl 2 = s(h) A 
pi = getp(m,ai,a e Ji) Ap 2 = getp(m,a t ,a e ,l 2 ) A 
Si = path2set(pi) A s 2 — path2set(p 2 ) A Sj %. s 2 



V 



(NSL5) 



(NSL6) 



Literals such as a € r, -^ordList(m,p) and / < are not normalized, but we leave 
them in the previous formulas for simplicity. □ 
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Lemma 2. Let A and B be two interpretations of a sanitized formula ip that 
agree on a : {addr, elem, ord, path, set}, and such that for every I € V\ eve \(ip), 
m € V mern (ip), and a € addr^; m A (a).arr A (l A ) = m B (a).arr B (l B ). It follows that 
reach A (m A , a A mt , a A nd , l A ,p A ) if and only if reach 6 (m B , af mt , a B nd , l B ,p B ). 

Proof. Let A and B be two interpretations of tp satisfying the conditions in 
the statement of Lemmara and assume reach A {m A 1 a A nit , a A nd , l A ,p A ) holds for 
some Ct^ n ^, Cl^nd G V 3 ddr{ t fh m e V mem [ip), p € V p3th (ip). Note that, by assumption 
<4ntt = a Lt> a tnd = a end and P A = P® ■ We consider the cases for p A : 
-Ifp A = e then a A mt = a A nd . Consequently, p B = e and a B nlt = a B ld , so for 

interpretation B, the predicate reach A (m B ,a B nit ,a B nd J B ,p B ) also holds. 
— The other case is: p — [a\ . . .a n ] with a\ = a^n and m A (a n ).arr A (l A ) = 
a en d, and for every r < n, m A (a r ).arr A (l A ) = a r+i . It follows, by (??) that 
m B (a n ).arr B (l B ) = a en d, and for every r < n, m B (a r ).arr B (l B ) — a r+ i. 
Hence, reach A (m B , af nit , a B nd , l B ,p B ). 
The other direction follows similarly. □ 

Lemma 3 (Gap-reduction). If there is a model A of if with a gap at n, then 
there is a model B of tp such that, for every I G V\ eve \((p), we let 

,B = \l A ifl A <n 
1^-1 ifl A >n 

The number of gaps in B is one less than in A. 

Proof. Let A be a model of <p with a gap at n. We build a model B with the 
condition in the lemma as follows. B agrees with A on addr, elem, ord, path, set. In 
particular, v B = v A for variales of these sorts. For the other sorts we let B a = A a 
for a — level, array, cell, mem. We define transformation maps for elements of the 
corresponding domains as follows: 



AevelO') = \ ' Array (A) (i) 

I j ' — 1 otherwise 
/3 ce ii((e, k, A, I)) = (e, k, (3 3rr3y (A), Aevei(O) Pmem{m){a) 

Now we are ready to define the valuations of variables I 
c : cell and m : mem: 

l B = P^(l A ) A B = (3 3rr3y (A A ) c B = p cen (c A ) m B = f3 mem (m A ) 

The interpretation of all functions and predicates is preserved from A. 

The next step is to show that B is indeed a model of ip. All literals of the 
following form hold in B because if they hold in A, because the valuations and in- 



A(i) if i < n 
A(i + 1) if i > n 
Pce\\(m(a)) 



level, A : array, 
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terpretations of functions and predicates of the correspondig sorts are preserved: 



ei ^ e 2 
a = null 
ki ^ k 2 

s = {a} 

Pi 7^P2 

s = path2set(p) 



ai ^ a 2 
c = error 
ki < k 2 

h < h 

s 1 = s 2 U s 3 

V = [a] 

append(p 1} p 2 ,p 3 ) 



l = q 

si = s 2 \s 3 
pi = rev(p 2 ) 
-nappend(p 1 ,p 2 ,p 3 ) 
ordList(m, p) 



A simple argument shows that literals of the form c = rd(m,a) and ra 2 = 
upd(rrii,a,c) hold in B if they do in A, because the same transformations are 
performed on both sides of the equation. The remaining literals are: 

— c = mkcell(e, k, A, I): Assuming c A = mkcell{e A 1 k A , A A , l A ), 
mkcell(e B ,k B ,A B ,l B ) = mkcell(e A , k A , f3 array (A A ), ft evei (l A )) = ftrilO?*) = c B 

— a = A[l]. Assume a A = A A [l A ]. There are two cases for l A . First, l A < n. 
Then, 

A B [l B ]= A A [l A ]=a A = a B 

Second, l A > n. Then, 

A B [l B ] = A A [(l A - 1) + 1] = A A [l A ] =a A = a B 

— B = A{1 <- a}. We assume B A = A A {l A <- a A }. Consider an arbitary 
m e N. If m = l B then 

(A B {l B <- a B })(m) = (P 3my (A A ){l B <- a B })(m) = a B 
If in 7^ l B and m < n then 

(A B {l B <- a B })(m) = ((3 array (A A ){l B <- a B })(m) = 

= (/3 amy (A A ))(m) = A A (m) = B A (m) = 

= P amy (B A {m)) = B B (m) 

Finally, the last case is m ^ l B and m > n. In this case: 

(A B {l B <- a e })(m) = G8 array (A*){i B <- a e })(m) = 

= (/W^))M = + 1) = BA ( m + 1) = 
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— s — addr2set(m,a,l) and p = getp(m,ai,a2,l)- We first prove that for all 
variables m and I, addresses a^t, a end and paths p, reach(m A 7 ai n n, a end , l A ,p) 
if and only if reach (m B , a init , a end , l B ,p). Assume reach(m A , a init , a end , l A ,p), 
then either o^ = a end and p = e, in which case reach(m B , ajmt, a end , l B ,p), 
or there is a sequence of addresses Oi, . . . ojv with 

(a) p=[a 1 ... a N ] 
(6) ai = a ini t 

(c) m- 4 (a r ).arr- 4 (/- /l ) = a r +i, for r < 

(d) m-^OAO.arr- 4 ^- 4 ) = a e „ d 

Take an arbitrary r < N. Either l A < n or l A > n (recall that l A is either 
strictly under or strictly over the gap). In either case, 

m B (a r ).arr B (l B ) = m A (a r ).arr A (l A ) = a r+ i 

Also, m B (aN)-arr B {l B ) = m B (aN)-arr B (l B ) = a end . Hence, conditions (a), 

(b) , (c) and (d) hold for B and reach(m B 7 a init ,a end ,l B ,p). Informally, pred- 
icate reach only depends on pointers at level I which are preserved. The 
other direction holds similarly. From the preservation of the reach predicate 
it follows that, if addr2set(m A ,a A ,l A ) = s A then 

addr2set(m B ,a B ,l B ) = {a' | Bp G £> pa th • (m, a, a' ,l,p G reach 13 } = 
= {a' | Bp G -4 P ath • (m,a,a' ,l,p e reac/i" 4 } = 
= addr2set(m A , a A ,l A ) = s A = s B 

Finally, assume p A = getp(m A , a A , a A , l A ). If (m A , a A , a A , l A ,p A ) G reach A 
then (m B ,a B ,a B ,l B ,p B ) G reach and hence p B = getp(m B ,a B ,a B ,l B ). The 
other case is e = getp(m A , a A , a A , l A ) when 

/or no path p, (m A , a A ,a A J A ,p) G reach A . 

but then also 

for no path p, (m B ,a B , a B ,l B ,p) G reach B 

and then e = getp(m B , of , of , Z e ), as desired. 

— skiplist(m,r, Z, 01,02). We assume skiplist(m A ,r A ,l A 7 a A 7 a A ). This implies: 

• ordList A (m A , getp A (a A 1 a A , 0)). Let p be an element of *4 pat h such that 
p = getp A (a^, a A , 0)). As shown previously, p = getp B (a? , of , 0)), and 

then ordList (m B , getp B (a B 1 a B 1 0)) holds because ordList (m A , getp A (a A , a A ,0)) 
does. 

• r- 4 = path2set A {getp A {m A ,a A ,a A ,Q)). Again r e = path2set B (getp B (m B , of , of ,0)) 
because getp B (m B , of , of , 0) = getp A (m A , a A , a A , 0). 

• < Z- 4 , which implies < Z B 

• Va G r- 4 . m- 4 (a- 4 ).ma3;- 4 < Z" 4 . Since r B = r A and m B (a) = (3 ce \\(m A (a)) 
it is enough to consider two cases. First, m A (a).max A = l A , in which case 
m B (a).max B = l B . Second m A (a). max A < l A , in which case m A (a). max A < 
l A . 
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• If (0 = l A ) then (0 = l B ). 

• If (0 < l A ), and for all i from to I: 



m A (a 2 ).arr A (i) = null A 
path2set A (getp A (m A , a A , af, i + 1)) C 
path2set A (getp A (m A , a A , af, i)) 



(7) 



(8) 



Then < I (because is never removed) 



This concludes the proof. 



□ 



Theorem 1. A TSL formula ip is satisfiable if and only if for some arrangement 
a, both (<p> PA A a) and (ip NC A a) are satisfiable. 

Proof. The direction follows immediately, since a model of ip contains a 

model of its subformulas <p PA and ip NC , and a model of ip PA induces a satisfying 
order arrangement a. 

For "<=" , let a be an order arrangement for which both (<p> PA A a) and 
(<^ NC A a) are satisfiable, and let A be a model of (</? NC A a) and B be a 
model of ((p PA A a). By Corollary [l] we assume that A is a gapless model. In 
particular, for all variables I G Vievei^), then l A < K, where K = |Vfevei(</?)|, 
and for all cells c e A ce u, with c = (e,k, A, I), I < K. Model B of (tp PA A a) 
assigns values to variables from V\ eve \(tp), consistently with a. The obstacle is 
that the values for levels in A and in B may be different, so the models cannot 
be immediately merged. We will build a model C of ip using A and B. Let K PA 
be the largest value assigned by B to any variable from Vfevei^)- We start by 
defining the following maps: 



Essentially, /* provides the level from A that will be used to fill the missing 
level in model C. Some easy facts that follow from the choice of the definition of 
/ and /* are that, for every variable I in V\ eve \(tp), f*(f(l A )) = l A ■ Also, every 
literal of the form B = A{1 <- a} satisfies that f*(l + 1) = f*(l) + 1 because a 
sanitized formula ip contains a literal l new = 1 + 1 for every such B = A{1 <— a}. 

We show now how to build a model C of (p. The only literals missing in </? NC 
with respect to ip are literals of the form / = q for constant level q. C agrees with 
A on sorts addr, elem, ord, path, set. Also the domain C| eV ei is the naturals with 
order, and C ce ii = C e i e m x C or d x C array X C| eve | and C mem = C^ 6 ' ■ For level variables, 
we let v c = v B , where v B is the interpretation of variable v in B, the model of 
(</3 PA A a). Note that v c = v B = f(v A ). For arrays, we define C array to be the 
set of arrays of addresses indexed by naturals, and define the transformation 
/3 : Array C array as follows: P 3rray {A){i) = A(f*(i)). 

Then, elements of sort cell c : (e, k, A, I) are transformed into /3 ce ii(c) = 
(e, fc, (3 3 nay(A), f (I)) ■ Variables of sort array A are interpreted as A c = P 3nay (A A ) 



f : [K] -> [K 
l A ^l B 



PA 



/* : [K ] -> [K] 



n i— > max{k G [K] | /(fc) < n} 
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and variables of sort cell as /3 ce ii(c). Finally, heaps are transformed by returning 
the transformed cell: for v 6 V mem , v c (a) = f3 C e\\{v A )(a). We only need to show 
that C is indeed a model of ip. Interestingly, all literals I = q in C are immediately 
satisfied because I — l B and q c = q , and the literal (/ = q) holds in the model 
B of <p PA . The same holds for all literals in ip of the form l\ < l 2 , l\ = l 2 + 1 and 
li 7^ Z2 : these literals hold in C because they hold in B. The following literals 
also hold in C because they hold in A and their subformulas either receive the 
same values in C than in A or the transformations are the same: 
ei ^ e 2 ax ^ a 2 

a = null c = error c = rd{m, a) 

ki =/= k 2 k\ < hi m,2 = upd(mi, a, c) 

c = mkcell(e, k, A, I) 

s = {a} si = s 2 U s 3 si = s 2 \ S3 

Pi^P2 P= [a] Pi = reu(p 2 ) 

s = path2set(p) append(pi : p 2 :P3) ^append{pi,p 2 ,P2,) 

ordList(m,p) 

Finally, observe that (s = addr2set{m, a, I)) and (p — getp(m, ax, a 2 , 1)) hold 
in C whenever they hold in A, as they follow directly from Lemma [2j The re- 
maining literals are: 

— a = A[l]: assume a A = A A [l A ]. Then, in C, a c = a c and 

A c [f] = P{A A )[f] = A A (f*(l c )) = A A (f*(f(l A ))) = A A (l A ) = a A = a c . 

— B = A{1 a} : We distinguish two cases. First, let n — I . Then, 

A c {f a}(n) = A c {f a}(l c ) = a, and 

B c (n) = B c {f) = B A {f*{f)) = B A {f*{f(l A ))) = B A (l A ) = a. 

The second case is n ^ f . Then (A c {l c ^ a})(n) = A c (n) = A A (f*(n)), 
and B c (n) = B A (f*(n)). Now, f*(n) ^ l A . To show this we consider the 
two cases for n 7^ l c : 

• If n < l c then, since f *(n) = maxjfc g [K]|/(fc) < n} by dehnition, 
f(l A ) = f >n and f*(n) < l A which implies f*(n) ^ l A . 

• If n > I then n > l c + 1. As reasoned above there is a different literal 
l new = 1 + 1 for which f*(n) > f*{f new ) > f*(f) = l A 

Since in both cases f*(n) 7^ l A , then 

B c (n) = B A (f*(n)) = A A (f*(n)) = A A {l A +- a}(/*(n)) = A c {f +- a}(n) 

Essentially, the choice to introduce a variable l new = 1 + 1 restricts the 
replication of identical levels to only the level I in B = A{1 +- a}. All higher 
and lower levels are replicas of levels different than I (where A and B agree 
as in model A). 

— s = addr2set{m,a,l) and p = getp(m,ax,a 2 ,l): it is easy to show by induc- 
tion on the length of paths that, for all l A : 

(m A ,a A ,b A ,l A ,p A ) € reach A iff (m c , a c , b c , f , p c ) e reach C (9) 
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It follows that s A — addr2set(m A , a A , l A ) implies s c — addr2set(m c ,a c ,l c ). 
Also p A = getp(m A ,a A ,a A ,l A ) implies that p c — getp(m c , af{ , 0% , l c )- Es- 
sentially, since level l c in C is a replica of level l A in A, the transitive closure 
of following pointers is the same paths (for getp) and the same also the same 
sets (for addr2set). 

— skiplist(m,r,l,a\,a2). We assume skiplist(m A ,r A ,l A ,a A ,a A ). This implies 
all of the following in A: 

• ordList A (m A ,getp A ((r A ,a A 1 0)). Let p be such that p = getp A (a A ,a A ,0)). 
As a consequence of Mp = getp c (a ( {, a 2 ,0)), and then 

ordList A (m A , getp A (a A , a A ,0)) implies ordList (m c , getp c (af , a 2 , 0)) . 

• r A — path2set A (getp A (m A ,a A ,a A ,0)). Because getp c (m c , af, a% , 0) = 
getp A (m A ,a A 1 a A ,0), once more r c — path2set c (getp c (m c , a^a^O)). 

• < l A , which implies < l c . 

• Va € r- 4 . m A (a A ).max A < l A . Since r c = r- 4 and 

m c (a) = r y(m A (a)) it is enough to consider two cases. First, m A (a).max A = 
I , in which case m c (a).max c = l c . Second m A (a).max A < l A , in which 
case m A {a).max A < l A . 

• If (0 = l A ) then (0 = f). 

• If (0 < Z- 4 ), and for all i from to l A : 

m A (a 2 ).arr A (i) = null A 
path2set A (getp A (m A , a A , a A ,i + 1)) C 
path2set A (getp A (m A , a A , a A ,i)) 

Then < l c . Consider an arbitrary i between and l c . It follows that 
f*(i) < f*(f) so f*(i) < l A and then 

m c (a 2 ).arr c (i) = m c (a 2 ).arr A {f* {€)) = null A = nulf 

path2set c (getp c (m c , aj,^!* + 1)) = 

path2set A (getp A (m A ,a A ,a A J*(i + l))) C 

path2set A (getp A (m A , a A ,a A , f* (i))) = 

path2set C (getp c (m c ,a1,a 2 ,ij) 

This concludes the proof. □ 

Lemma 5. Let ijj be a sanitized TSL formula with no constants. Then, ip is 
satisfiable if and only if r ^/> n is also satisfiable. 

Proof. Directly from Lemmas [6] and [7] below, which prove each direction sepa- 
rately. □ 

Lemma 6. Let ip be a normalized set of TSL literals with no constants. Then, 
if if is satisfiable then r ip~ 1 is also satisfiable. 

Proof. Assume if is satisfiable, which implies (by Corollary [T]) that ip has a 
gapless model A. This model A satisfies that for every natural i from to K — 1 
there is a level I e Vi eve i(<p) with l A = i. 



23 



Building a Model B We now construct a model B of r <p n . For the domains: 

^addr -^addr ^elem -^elem B ord ^ord ^path -^path ^set A se t 

and 

filevel = [K] Bcell = ^elem X B ord X B* ddr B mem = B^' 

For the variables, we let v B = v A for sorts addr, elem, ord, path and set. For 
level, we assign l B = l A , which is guaranteed to be within and K — 1. For cell, 
let c = (e, k, A, I) be an element of _4cell. The following function maps c into an 
element of Seel I: 

a(e, k, A, I) = (e, k, A(0), . . . , A(K — 1)) 

Essentially, cells only record information of relevant levels, which are those levels 
for which there is a level variable; all upper levels are ignored. Every variable 
v of sort cell is interpreted as v B = a(v A ). Finally, a variable v of sort mem 
is interpreted as a function that maps an element a of Saddr into a(v A (a)), 
essentially mapping addresses to transformed cells. Finally, for all arrays A in 
the formula <p, we assign v%i] = A A (i). 

Checking the Model B We are ready to show, by case analysis on the literals 
of the original formula (p, that B is indeed a model of r </? n . The following literals 
hold in B, directly from the choice of assignments in B because the corresponding 
literals hold in A: 

ei 7^ e 2 
a = null 
ki ^ k 2 

s = {a} 

Pi T L P2 

s = path2set(p) 

ordList(m,p) 



ai ^ a 2 
c = error 
ki < k 2 

h < h 

s\ = s 2 U s 3 

P = M 

append{pi,p 2 ,p 3 ) 



c = rd(m, a) 

m 2 = upd(m\,a, c) 

l = q 

si = s 2 \s 3 

p x = rev(p 2 ) 

-^append(pi,p 2 ,p 3 ) 



The remaining literals are: 

— c = mkcell(e, k, A, I): Clearly the data and key fields of c B and the translation 
of mkcell B (e, k, . . .) coincide. Similarly, by the a map for elements of BceW, 
the array entries coincide with the values of the fresh variables Vj^y Hence, 
c = mkcell(e, k, va[o], ■ • , v a[k-i]) holds in B. 

— a = A[l]: our choice of makes 

v B A[l] = A A (l B ) = A A (l A ) =a A = a B 
so the clause generated from a = A[l] in r ip n holds in B. 
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— B = A{l<r- a}: In this case, for j = l A = l B , v B {j] = B A (l A ) = a A = a B . 
Moreover, for all other indices i: 

v B B[{l =B A (i)=A A (i)=v% ] 

so the clause generated from B = A{1 <— a} in r tp n holds in B. 

— s = addr2set(m 1 a, I): it is easy to show by induction on the length of paths 
that, for all l A : 

{m A ,a A ,b A ,l A ,p A ) £ reach A iff {m B , a B , b B , l B , p B ') £ reach 8 (10) 

It follows that s A — addr2set(m A , a- 4 , Z- 4 ) implies s B = addr2set{m B , a B ,l B ). 

— p = getp(m,a 1 ,a2,l)- Fact ( |10[ ) also implies immediately that if literal p — 
getp{m, a\,a,2, 1) holds in A then p = getp(m, a%,a2, I) holds in B. 

— skiplist{m,s,a\,a-2): Following ^ the four disjuncts (1) the lowest level is 
ordered, (2) the region contains exactly all low level, (3) the centinel cell has 
null successors, and (4) each level is a subset of the lower level, hold in B, 
because they corresponding disjunct holds in A. 

This shows that B is a model of r tp' 1 and therefore r tp~ 1 is satisfiable. □ 

Lemma 7. Let tp be a normalized set of TSL literals with no constants. If r tp~ 1 
is satisfiable, then tp is also satisfiable. 

Proof. We start from a TSLk model B of r tp n and construct a model A of (p. 

Building a Model A We now proceed to show that ip is satisfiable by building 
a model A. For the domains, we let: 

•^addr ^addr ^elem ^elem -^ord = ^ord >^path ^path >^set ^set- 

Also, *4|evei is the naturals with order, and 

•^cell "^elem ^ «^ord * Aarray ^ "^level ^mem *^cell 

For the variables, we let v A = v B for sorts addr, elem, ord, path and set. For 
level, we also assign l A — l B . For cell, let c = (e, k, do, . . . , «k-i) be an element 
of B ce \\. Then the following function /3 maps c into an element of -A ce ii: 

/3(c : (e, k, do> • • • > a K-i)) = ( e > k, A, I) where (11) 



Q>i if < i < I 
null if i > I 

Every variable v of sort cell is interpreted as v A = (3(v B ). Finally, a variable v 
of sort mem is interpreted as a function that maps an element a of *4 a ddr into 
f3(v B (a)), mapping addresses to transformed cells. 

Finally, for all arrays variables A in the original formula tp, we assign: 

fufri if i < K 
A A (i) = \ A[l] _ (12) 

I null otherwise 
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Checking the Model A We are ready to show, by cases on the literals of the 
original formula tp that A is indeed a model of ip. The following literals hold in 
A because the corresponding literals hold in B: 



ei 7^ e 2 
a = null 
ki ^ k 2 

s = {a} 

Pi ^P2 

s = path2set{p) 



ai ^ a 2 
c = error 
ki r< k 2 
h < h 
s\ = s 2 U s 3 

V = [a] 

append{p 1 ,p 2 ,p 3 ) 



c = rd(m, a) 

m 2 = upd(mi, a, c) 

l = q 

81 = S 2 \ S 3 

pi = rev(p 2 ) 

-nappend(p 1 ,p 2 ,p 3 ) 

ordList(m,p) 



The remaining literals are: 

— c = mkcell(e, k, A, I) : Clearly the data and key fields of and the translation 
of mkcell A (e, k, . . .) given by ( 11 ) coincide. By the choice of array variables 
A-^ii) = = a,, so A and the array part of c coincide at all positions. 

For l A we choose K for all cells. 

— a = A[l]: holds since 



a = a = v 



* m =A A {l B ) = A A {l A ). 



— B = A{1 a}: We have that the translation of B = A{1 a} for r ip^ given 
by ([T]) holds in B. Consider an arbitrary level m < K. If m = l B = l A then 
a = VB[ m ] = B A (m). If to 7^ l B then u B [ m ] = i/^i and hence B A {m) = 

%[m] = VA[rn] = A*(m). 

— A = B: the clause ( [l2[ ) generated from A = B in r <p n holds in B, by assump- 
tion. For an arbitrary j from [K]: 

^0-) = «5m = v% [3] = B A {j) 

Moreover, for j > K, then A A (j) = null = B A (j) and consequently A A = 
B A as desired. 

— s = addr2set(m, a, I): it is easy to show by induction on the length of paths 
that, for all l A : 

{m A ,a A ,b A ,l A ,p A ) G reach A iff {m B , a B , b B , l B , p B ') e reach® (13) 

It follows that s A = addr2set(m A , a A , l A ) implies s A = addr2set(m A , a A , l A ). 

— p = getp(m, ai, 02, ?) : Fact ( |13[ ) also implies immediately that if literal p = 
getp(m, ai,a 2 , 1) holds in A then p = getp(m, ai, ai, /(0) holds in S. 

— skiplist(m, s, a\, a 2 ): Following ([2]) the four disjuncts (1) the lowest level is 
ordered, (2) the region contains exactly all low addresses in the lowest level, 
(3) the centinel cell has null successors, and (4) each level is a subset of the 
lower level, hold in A, because they corresponding disjunct holds in B. 

This shows that A is a model of <p and therefore <p is satisfiable. □ 



